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SUMMARY 


This  is  a  second  report  to  the  Mathematical  Analysis  Division  of  the 
National  Highway  Traffic  Safety  Adnjlnlst ration  (NHTSA)  on  the  subject 
of  forecasting  annual  highway  fatalities.    This  report  concerns  a  compa- 
rison of  several  time  series  analysis  programs  based  on  exponential 
smoothing  and  non-decompositlonal  methods  currently  employed  by  NHTSA  for 
projecting  the  annual  traffic  fatalities  for  the  entire  U.S.  Several 
methods  of  data  aggregation  are  studied. 

It  is  found  that  there  is  some  advantage  in  using  lumped  (pooled) 
data  for  each  region  aggregated  either  quarterly  or  half  yearly,  and 
using  the  Sum  6f  Regional  estimates  to  estimate  the  national  value. 

Also,  there  does  not  appear  to  be  any  great  difference  in  the  results 
obtained  using  the  non-decompositlonal  methods  and  those  obtained  by 
time-series  analysis  programs  based  on  exponential  smoothing  methods. 

Estimates  for  the  1972  and  1973  national  traffic  fatalities  by  a 
variety  of  methods  were  made.    For  1973  the  estimates  ranged  from  a  low 
of  54186  to  a  high  of  55994,  with  a  mean  of  55055. 
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LIST  OF  SYMBOLS  AND  FORMULAS 

S  =  the  actual   (observed)  value  of  the  time  series  datum  (in  this 

case  fatalities).     This  may  be  a  monthly,  quarterly,  half- 
year  of  yearly  value,  depending  on  the  context 

S*        =  the  actual  value  for  a  calendar  year 

S  =  an  estimate  of  S 

^,  ^  below 

A  =S-S;ifA>0,   the  estimate  is  above  the  actual 

A^        =  the  error  in  the  forecast  for  the  last  6  months  of  the 
calendar  year  (Table  1) 

A/S*     =  the  mean  absolute  fractional  error 

m          =  current  total  for  specified  months  of  fatalities  from 
o 

which  calendar  year  forecasts  are  made.     This  may  be  for 

3,  6  or  9  months.  In  this  report  a  6-month  value  is 
customary. 

m  ,       =  the  corresponding  total  for  specified  months  of  fatalities 


-1 


for  i^h  previous  year 


Y__j^      =  the  total  fatalities  for  the  i^h  previous  year;   i  =  0,   1,  2, 

Y.         =  an  estimate  of  fatalities  for  year  i 
3 

K  =  the  number  of  years  of  data  over  which  summation  occurs 

N  =  an  NHTSA  forecast  method 

(K) 

-  an  NHTSA  forecast  method  applied  to  the  Lumping  mode  using 
K  years  of  past  data 

(K) 

N2        =  an  NHTSA  forecast  method  applied  to  the  Sum  of  States  mode 
using  K  years  of  past  data 

H  =  monthly  data 

Q  =  quarterly  data 

H  =  half-year  data 

Y  =  yearly  data 
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L(M)  =  a  lumping  mode  (for  a  region  or  U.S.)  using  monthly  data. 

Monthly  data  from  individual  states  constituting  a  region 
are  added  together  to  create  a  time-series  for  a  region 

SS(q)  =  a  Sum  of  States  mode  consists  of  making  individual  analyses 
and  predictions  for  each  state,  then  s\m  the  results  over 
the  states  constituting  a  region,  then  sum  the  results 
of  each  of  the  regions  in  order  to  obtain  a  forecast  for  the 
natural  values;  in  this  instance  quarterly  data  are  indicated 

L  =  ^  [L(M)  +  L(Q)  +  L(H)  +  L(Y)] 

=  an  average  of  liomping  methods 

=  ^  [L(Q)  +  L(H)  +  L(Y)] 

^  Regions  =  a  forecast  for  the  national  value  obtained  by  summing 
the  individual  forecasts  for  each  region 

Entire  US  =  a  forecast  for  the  national  value  obtained  from  one 
grand  lumped  time  series  for  the  nation 

X  =  a  leap  year  adjustment  factor 
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NHTSA  FORECAST  METHODS 
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Modified  NHTSA  Forecast  Methods 
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Note  that  in  any  of  the  methods  with  a  superscript  (K)  will  yield 
K  estimates,  say 

yd)     y(2)  ;(K) 

and  that  these  may  be  averaged  to  produce  still  other  estimates. 
In  table  headings: 

GEXS  =  GEXSMO  time  series  analysis  computer  program 
EXPS  E  EXPSMOOTHING  time  series  analysis  computer  program 
MAE  =  Mean  absolute  fractional  error 
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1.0     INTRODUCTION  MD  PROBLEM  DEFINITIONS 

In  a  previous  study for  the  Mathematical  Analysis  Division 
(mad)  of  the  National  Highway  Traffic  Safety  Administration 
(NHTSA),  the  Technical  Analysis  Division  made  a  comparison  of 
forecasts  for  the  current  calendar  year  of  national  (entire  U.S.) 
highway  fatalities.     Those  obtained  by  using  readily  available 
computer  routines  for  time  series  analyses  based  on  exponential 
smoothing  procedure  were  compared  with  methods  employed  currently 
by  NHTSA.     The  methods  were  applied  to  estimate  the  current  year 
total  of  fatalities  given  an  initial  3  months  of  fatality  data, 
then  an  initial  6  months  of  data  and  finally  an  initial  9  months 
of  data.     That  report  concluded,  in  part,  that  there  is  no  coercive 
evidence  to  change  their  methods  when  operating  on  the  national 
highway  fatalities  time  series. 

It  was  pointed  out  by  members  of  NHTSA^  that  the  cyclic  and 
trend  properties  of  the  highway  fatality  time  series  appear  to 
differ  in  the  various  regions  and,  perhaps,  are  idiosyncratic  from 
state  to  state.     Thus  it  appeared  possible  that  some  important 
information  is  lost  when  operating  on  the  time  series  for  the  entire 
U.S.     To  explore  this  possibility  it  was  suggested  that  forecasts 
for  each  region  be  made  using  an  exponential  smoothing  method  on 


^U.S.  Department  of  Commerce,  National  Bureau  of  Standards, 
Time  Series  Forecasting  of  Highway  Accident  Fatalities.  NBSIR-73-138 
(19T3). 

^Letter  from  R.J.  Taylor,  NHTSA  ,  to  Alexander  Craw,  NBS , 
January  29,  1973,  Reference  number  NU3-31. 
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each  region's  time  series  and  that  these  forecasts  be  summed,  to 
obtain  a  national  forecast.     These  forecasts  co\ild  then  be  compared 
with  the  forecasts  on  the  national  time  series. 

Unfortunately,  the  state  (and  therefore  the  regional)  data 
is  available  only  for  traffic  fatalities.     Non-traffic  fatalities 
are  estimated  on  a  national  basis  for  inclusion  into  the  national 
time  series  for  highway  fatalities.     Thus  the  summed  regions 
forecast  must  be  compared  with  a  forecast  derived  from  the  national 
traffic  fatalities  time  series. 

At  first,  following  the  procedures  of  the  previous  study, 
the  analyses  were  performed  using  12  years  and  6  months  of  data 
to  obtain  a  forecast  for  the  last  6  months  of  calendar  year  1972. 
From  the  known  first  6  months  totals  and  the  forecast  for  the  last 
6  months  forecasts  were  obtained  for  the  highway  traffic  fatalities 
for  the  entire  calendar  year  19T2. 

Midway  through  the  analyses,  we  were  asked  to  provide  forecasts 
for  the  calendar  year  1973  by  each  of  the  several  methods  under 
review.     This  posed  no  problem  as  far  as  the  methods  using  exponential 
smoothing  were  concerned.     However,  because  the  NHTSA's  procedures 
required  some  data  for  the  first  several  months  of  the  calendar 
year  which  were  not  then  available,  the  current  NHTSA  methods 
required  slight  modification.     The  details  of  these  modifications 
are  discussed  in  section  3. 


In  the  first  study^  the  analyses  were  based  on  the  first 
3  months  of  calendar  year  data,  then  the  first  6  months  of  data, 
and  finally  the  first  9  months  of  data.     In  the  new  work,  in  order 
to  cut  down  on  the  amount  of  computing  required,  it  was  decided 
to  make  a  comprehensive  comparisoh  based  on  the  first  6  months 
data  for  a  calendar  year.    We  remark  that  in  the  previous  work, 
no  systemmatic  differences  were  noted  between  the  results  based  on 
3  and  9  months  of  data  and  those  based  on  6  months  of  data. 
Also,  rather  than  carrying  out  a  large  number  of  runs  in  the  time- 
sharing mode  using  a  typewriter  output,  we  decided  to  make  the 
comparisons  using  the  GEXSMO  time-series  analysis  routine  and  have 
an  additional  partial  comparison  using  the  EXPSMOOTHING^  program 
operated  in  a  time-share  mode  but  with  high  speed  output  from  tape. 

The  notations  adopted  for  the  previous  report  will  be  used  in 
this  report;  any  changes  or  additions  will  be  noted  explicitly. 
(See  List  of  Symbols  and  Formulas.) 


^U.S.  Department  of  Commerce,  National  Bureau  of  Standards, 
Time  Series  Forecasting  of  Highway  Accident  Fatalities.  NBSIR-T3-138 
(1973). 

^For  discussion  of  these  time  series  analysis  programs  see  the 
discussion  and  references  given  in  NBSIR-T3-138  and  the  Appendix. 
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2.0    ANALYSES  BASED  ON  12  YEARS  AND  6  MONTHS  OF  DATA  (l2,  6) 

Production  runs  were  made  using  monthly  (M)  data,  quarterly 
(Q)  data,  half  -year  (H)  data  and  yearly  (Y)  data  for  several  levels 
of  aggregation  and  for  two  different  modes  of  aggregation. 

The  first  mode  consists  of  making  individual  analyses  and 
predictions  for  each  state,  then  sum  the  results  over  the  states 
constituting  a  region,  then  sum  the  results  of  each  of  the  regions 
in  order  to  obtain  a  forecast  for  the  national  values.     This  mode 
is  denoted  by  SS(M),  SS(Q),  . . ; . 

The  second  mode  of  aggregation  is  called  lumping  and  is  denoted 
by  L(M),  L(Q,)  ,   ....     In  this  mode,  a  time  series  for  each  region 
was  created  by  limping  (adding)  the  data  from  each  state  in  the 
region.     Forecasts  were  then  obtained  for  each  of  the  regions  directly. 
Finally,  forecasts  were  obtained  for  the  national  valves  by  summing 
the  results  of  the  regions.     This  latter  is  denoted  by  ^  Regions. 

In  the  summary  tables  GEXS  is  used  to  denote  the  GEXSMO 

program,  EXPS,  the  EXPSMOOTHING  program.     N  denotes  an  NHTSA 

method.     For  each  region  (U.S.)  the  forecast  for  the  current  calendar 

year  from  6  months  of  data  by  a  method  N  is  independent  of  how  the 

aggregation  for  the  region  (U.S.)  is  made.     However,  the  forecasts 

for  region  (U.S.)  by  summing  forecasts  of  states  by  a  method  N 

does  not  necessarily  equal  the  forecasts  for  region  (U.S.)  using 

( K)        ( K ) 

lumped  data  by  method  N.     Thus,  for  a  fixed  K,  NjJ^  ^  =}=  > 

where  K  denotes  the  nimber  of  years  of  past  data  used  in  the  N 

method,  and  the  subscripts  refer  to  the  lumping  or  Sum  of  States 

mode,  respectively.     (See  Appendix  A  for  more  detail.) 
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Table  1  contains  a  suimnary  of  the  comparisons  of  forecasts 
by  data  features  based  on  the  fractional  error  criterion,  A/S*. 
Here  Ag  is  the  error  in  the  forecast  for  the  last  6  months  of  the 
calendar  year  and  S*  denotes  the  observed  value  of  the  fatalities 
for  the  calendar  year.     The  forecasts  in  Table  1  are  for  the  calendar 
year  1972,  based  on  12  years,  6  months  of  data  (12,  6).     The  (signed) 
fractional  error  for  each  region  is  shown  for  each  mode  aiad  method 
of  aggregation. 

For  the  forecasts  produced  by  the  GEXSMO  program: 

1)  the  best  forecasts  for  regions  is  by  L(Q); 

2)  the  best  forecasts  for  U.S.  is  for  ^  Regions  by  L(Q); 

3)  ^  Regions  forecasts  are  better  than  the  forecasts  for  the 
entire  U.S.  (based  on  one  grand  lumping). 

k)     In  general,  for  regions  the  mean  absolute  fractional  error 

(MAE)  for  limped  forecasts  is  less  than  the  MA.E  for  forecasts 
by  Sum  of  States  mode.     The  exception  is  SS(M). 

5)     In  general,  for  ^  Regions,  the  error  for  forecasts  by 
the  lumping  mode  is  less  than  that  for  forecasts  by  Sum 
of  States  mode.     The  exception  is  SS(M). 

Guided  by  these  results  we  decided  to  make  some  computer  runs 

using  a  different  computer  time-series  analysis  program,  namely 

EXPSMOOTHING .     Rather  than  run  a  complete  duplication  of  the  GEXSMO 

runs,  it  was  decided  to  operate  in  the  lumped  (L)  mode  only  and 

to  use  only  quarterly,  half  year  and  yearly  data.     The  results  of 

these  computations  are  shown  in  Table  1  also. 
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Of  the  EXPSMOOTHING  methods,  the  best  forecasts  were  obtained 
using  half-year  data,  L(H)  .     For  this  mode,  L(H),  the  Regions 
forecast  was  better  than  the  L(H)  forecast  for  the  Entire  U.S., 
although  both  were  very  good.     For  both  quarterly  and  yearly  data, 
however,  the  Entire  U.S.  estimates  were  better  than  the  ^  Regions 
estimates.     However,  the  relatively  poorer  forecasts  of  ^  Regions 
in  this  case  of  L(Q)  is  due  primarily  to  a  poor  estimate  for 
one  region.  Region  8.     Very  poor  estimates  were  obtained  were 
obtained  for  Regions  2,  3  and  8  using  L(Y).    We  remark  that  in 
general  Region  8  was  poorly  forecast  by  all  methods. 

Using  the  NHTSA  method  and  K=12  (see  Appendix)  the  forecasts 

for J     Regions! by  lumping  were  better  than  those  produced  by  the 
]J  Regions] 

Sum  of  States  mode  (see  Table  2),  and  the  ^  Regions  forecast  by 
lumping  was  better  than  that  for  the  Entire  U.S. 

Applying  the  NHTSA  methods  to  the  Entire  U.S.,  it  was  possible 
to  produce  12  different  forecasts,  depending  on  whether  1,  2, 
12  years  of  past  data  was  employed.     These  ranged  in  value  from 
5522U  to  55930  with  a  mean  of  55593  for  the  U.S.  traffic  fatalities 
The  corresponding  fractional  absolute  errors  ranged  from  O.OO61 
to  0.0190  with  a  mean  of  0.0128.     The  values  of  K  corresponding  to 
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these  forecasts  vere  K=h  and  K=ll  with  the  K-value  corresponding 

to  the  mean  between  6  and  T.     (Table  2} 

The  NHTSA  methods^  ma^^  be  called  ratio  methods,  since  they 

may  be  vritten      y(K)  rk 

22  ^  ^  ^i=l  -i 

m„ 

0  )  .  ^m  . 

^1=1  -1 

Forecasts  were  also  made  by  a  difference  method 


If  m^  is  the  sum  of  data  for  the  first  6  months  of  the  current 

year,  then  Y  ^  -  m_-j^  is  the  sum  of  the  data  for  the  1st  6  months 

of  the  previous  year.     Res-ults  for  1972  by  this  method  were  excellent, 

the  fractional  absolute  error  being  0.0005.     This  appears  to  be 

a  fortuitous  result.    When  the  method  was  applied  to  previous  years 

a  wide  range  of  values  resulted.     (see  Table  3) 


^George  Suzuki  points  out  that  the  current  MTSA  methods  utilizes 
the  so-called  "non-seasonal"  or  "non-decompositional"  procedure. 
This  is  in  contrast  to  time-series  analysis  which  generally  goes 
through  a  decomposition  process  to  isolate  the  trend.     In  this 
decomposition  process  ,  the  seasonal  and  other  regular  variations 
are  identified.     The  reconstitution  of  a  future  time  series,  i.e., 
extrapolation,  is  performed  by  extrapolating  the  trend  and  then 
imposing  the  corresponding  cyclical  variations.     Both  the  GEXSMO 
and  EXPSMOOTHING  time-series  programs  do  the  latter.     (See  Appendix.) 
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2.2    A  Look  to  Regions 

On  the  basis  of  the  forecasts  for  1972  using  12  years 
and  6  months  of  data,  it  appeared  that  some  improvement  could  be  • 

obtained  in  estimating  national  traffic  fatalities,  if  the  estimates 

for  each  region  could  be  improved.     There  also  appears  to  be  some 
legitimate  merit  in  making  acc\irate  estimates  for  the  regions  in 
their  own  right.  j 

Two  places  to  look  for  possible  improvements  are  (a)  those 
regions  which  contribute  the  largest  numbers  to  the  national  total 
and  (b)  those  regions  which  were  forecast  poorly. 

Over  50  percent  of  the  national  traffic  fatalities  for  1972 
were  contributed  by  Regions  U,  5  and  6.     (See  Table  4).     Regions  9 
and  3  contribute  another  20  percent  of  the  total.     If  one  were  to 
look  for  improvements  in  forecasting  national  traffic  fatalities, 
these  are  the  regions  that  should  be  examined  for  homogeneity  of 
data  characteristics.  ■  "' 

If  Regions  U,  5  and  6  were  forecast  accurately,  by  say  the 
GEXSMO  program  using  the  lumped  mode  and  the  other  Regions  remained 
unchanged,  then  the  total  error  in  the  forecast  could  be  reduced 
from  60k  to  70  (out  of  5^889)  if  monthly  data  were  used.     The  reduction 
in  error  would  be  from  U78  to  -110,  from  993  to  23^  and  from  -hlj 
to  -101  if  quarterly,  half-year  or  yearly  data  were  employed, 
respectively.     These  might  be  looked  upon  as  upper  bounds,  it  being 
presumptious  to  assume  that  the  forecast  could  be  made  with  no 
error.     However,  if  the  accuracy  of  the  forecasts  for  these  three 
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Table  4.     Order  of  Region  by  Number  of  1972  Traffic  Fatalities 


1972  Percent 


Order 

Region 

Fatalities 

of  Total 

Cum  % 

1 

4 

11998 

21.86 

21.86 

2 

5 

10363 

18.88 

40.74 

3 

6 

6862 

12.50 

53.24 

4 

9 

6189 

11.28 

64.52 

5 

3 

5177 

9.43 

73.95 

6 

2 

4508 

8.21 

82.16 

7 

7 

3477 

6.33 

88.49 

8 

8 

2182 

3.98 

92.47 

9 

1 

2143 

3.90 

96.37 

10 

10 

1990 

3.62 

99.99 

Total 

54889 
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regions  alone  were  improved  by  a  factor  of  2,  i.e.  from  slightly 
over  2  percent  average  error  to  slightly  over  1  percent  average 
error,  then  the  total  error  in  the  national  forecast  would  have 
been  reduced  to  137,  l8U,  663,  -259  by  the  four  data  aggregation 
M,  Q,  H  and  Y,  respectively,  using  the  GEXSMO  program. 

Before  any  calculations  were  made,  the  NHTSA  time-series 
program  TIMSRU  output  was  examined  by  state  for  each  region.  The 
seasonality  factors  were  scanned  to  determine  if  a  particular  region 
could  be  broken  down  into  sub-regions  that  had  similar  seasonality 
factor  patterns.     The  results  of  this  exercise  are  given  in  Table  5- 
Using  these  definitions  of  sub-regions  as  a  guideline,  calculations 
of  regional  fatalities  were  made  using  the  GEXSMO  program  with  liimped 
quarterly,  L(Q),  data  for  the  sub-regions  U.2,  5.1  and  5-2.  Since 
subregion  h.l  consisted  of  the  one  state  (Florida),  this  calculation 
was  at  hand  in  the  SS(Q)  set  of  calculations.     Adding  the  results 
of  fatalities  for  regions  U.l  and  k.2  the  new  estimate  of  fatalities 
for  Region  k  became  12126,  an  overforecast  by  83  fatalities  with 
fractional  error  A/S*  =  .0178  in  both  cases.     These  are  an  improvement 
from  A  =  272  to  A  =  I83  when  comparison  was  made  to  the  SS(Q) 
mode  for  Region  5-  .  . 

No  recalculations  were  made  for  Region  6,  the  region  with  the 
third  highest  fatality  total  for  1972,  because  there  did  not  appear 
to  be  any  sub-regions  distinguishable  from  the  entire  region  on  the 
basis  of  seasonality  factors. 
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Table  5. 


Tentative  New  Regions  Based  on  Similarity  of  Seasonality  Factors 

from  TIMSR4 


New 

Region*      Region  States 


4  4.1  Florida 

4.2  Others:     Alabama,  Georgia,  Kentucky,  Mississippi, 

North  Carolina,  South  Carolina,  Tennessee 

5  5.1  Minnesota,  Wisconsin 

5.2  Illinois,  Indiana,  Ohio,  Michigan 

6  6.0  No  change;  Arkansas,  Louisiana,  New  Mexico,  Oklahoma, 

Texas 

9  9.1  California,  Arizona 

9.2,  Hawaii 

9.3  Nevada 

3  3.1  District  of  Columbia 

3.2  Others:     Delaware,  Maryland,  Pennsylvania,  Virginia, 

West  Virginia 

2  2.0  No  change:    New  Jersey,  New  York,  Puerto  Rico** 

7  7.0  No  change:     Iowa,  Kansas,  Missouri,  Nebraska 

8  8.0  No  change:     Colorado,  Montana,  North  Dakota, 

South  Dakota,  Utah,  Wyoming 

1  1.1  Maine,  Vermont 

1.2  Rhode  Island 

1.3  Connecticut,  Massachusetts,  New  Hampshire 

10  10.1  Alaska 

10.2  Idaho 

10.3  Oregon,  Washington 


*  In  order  of  decreasing  magnitude  of  traffic  fatalities  for  1972. 
**  Puerto  Rico  not  included  in  this  study. 
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Region  9»  the  fourth  highest  in  1972,  was  the  best  forecast 
region  by  both  the  GEXSMO  and  EXPONENTIAL  programs.     There  did 
not  appear  to  be  any  great  chance  of  picking  up  much  improvement 
in  forecast  in  this  region. 

Table  6  gives  a  comparison  of  forecasts  for  1972  by  region  by 
each  of  the  methods  using  12  years  and  6  months  of  data.     The  comparison 
are  on  the  basis  of  the  fractional  error  (A/S*).     In  this  table, 
which  contains  the  data  of  Tables  1  and  2  but  is  presented  to 
emphasize  regions,  entries  are  marked  B  to  indicate  the  "best" 
mode  of  forecasting  each  regions  fatalities.     "Best"  mode  is  that 
for  which  the  absolute  value  of  the  fraction  error  is  smallest. 
On  this  basis,  for  the  GEXSMO  program  forecasts,  the  best  mode  of 
forecasting  is  L(M)  for  Region  1,  L(H)  for  Region  2,  L(Q)  for 
Region  3,  SS(M)  for  Region  U,  etc.     The  most  frequent  "best"  mode 
is  L(Q)  with  a  frequency  of  3  out  of  10.    The  mode  with  the  lowest 
mean  absolute  fractional  error  is  also  L(Q).     The  modes  with  no 
best  regional  forecasts  are  SS(Q)  and  SS(H).     Region  8  had  the  worst 
"best"  forecast  with  A/S*  for  this  region  being  -.0197-     On  the 
basis  of  mean  absolute  fractional  error  the  region  that  had  best 
forecasts  is  Region  9* 

The  ranking  order  for  each  of  the  regions  is  also  given  in 

Table  ih.     The  symbol,  h  ,  means  that  (in  this  case  Region  l) 

was  the  hth  best  forecast  region;    10    indicates  the  worst  forecast 

on  the  average.     This  dubious  distinction  went  to  Region  8  for  all 

three  methods:     GEXSMO,  EXPSMOOTHING  and  KHTSA  methods. 
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While  8  modes  of  forecasting  were  done  by  GEXSMO,  only  three 
modes  vere  run  using  the  EXPSMOOTHING  program:     L(Q) ,  L(H)  and 
L(Y).     On  the  basis  of  frequency  of  best  regional  forecast,  L(H) 
had  a  frequency  of  6,  L(Q)  had  3  and  L(Y)  had  1. 

Results  for  the  two  possible  NHTSA  methods,  by  sums  of  states  method 
(Ng)  and  the  lumping  method  (N-j^)  were  fairly  evenly  divided: 
6  to  5  in  favor  of  N^.    ■       -         .  . 

■   The  frequency  of  firsts ,  seconds  and  third  "best"  regional 
forecasts  by  procedure  is  given  in  Table  7 

^^-^       ■  .  .  .  TABLE  7 

Frequency  of  First,  Second  and  Third  Best  Procediire  (Regions) 


GEXS 

EXPS 

N 

First 

5 

3 

2 

Second 

h 

i+ 

2 

Third 

1 

3 

6 

On  the  basis  of  this  sample,  if  one  were  interested  in  accuracy 
of  regional  forecasts,  it  would  appear  that  one  of  the  time  series 
programs  should  be  preferred  over  the  current  N  methods . 
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The  second  way  to  improve  forecasts  would  be  to  improve  the 
accuracy  of  the  worst  forecast  regions,  provided  of  course  that 
the  improvement  would  be  large  enough  to  seriously  affect  the  national 
values.     Since  the  current  national  traffic  fatalities  is  near  55,000, 
a  change  of  550  would  affect  a  change  of  approximately  one  percent  in 
the  national  forecast,  275  would  affect  a  change  of  approximately  0.5 
percent  and  110  would  effect  a  change  of  approximately  0,2  percent.  In 
our  notation  the  fractional  error  would  be  ,01,   .005  and  .002  respectively. 

Region  8  was  the  worst  forecast  region  by  each  of  the  procedures. 
The  total  fatalities  for  1972  was  2182  or  approximately  4  percent  of  the 
national  value.     The  best  forecast  by  any  method  is  close  to  2  percent 
or  Z  44  fatalities.     But  this  is  less  than  0.1  of  a  percent  of  the  national 
value,  so  we  abandoned  any  effort  to  make  special  progress  on  this  region. 
We  remark  in  passing  that  Region  8  was  underf orecast  by  all  of  the  GEXSMO 
and  EXPSMOOTHONG  programs,  and  that  Region  7  was  underf orecast  by  all 
the  GEXSMO  programs. 

The  second  worst  forecast  region  is  Region  2  which  contributes  about  , 
8  percent  of  the  national  total.     However,  this  region  was  forecast  very 
well  by  GEXSMO  using  L(H)  and  by  EXPSMOOTHING  using  L(H)  and  by  both  N 
methods.     Table  5  indicates  there  is  no  need  to  break  the  region  into 
subregions.     Also  this  case  has  already  been  studied  by  the  sum  of 
states  mode  since  the  region  consists  of  only  2  states:"''    New  York  and 
New  Jersey. 

"'"Puerto  Rico  was  not  included  in  this  study. 
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3.0     FORECASTS  OF  TOTAL  U.S.  TRAFFIC  FATALITIES  USING  NO  DATA  FROM 
CURRENT  YEAR 

3. 1    Modifications  Needed 

To  provide  forecasts  for  the  calendar  year  1973,  two  things  were 
necessary:     (1)   the  need  to  modify  the  NHTSA  method  slightly  since  data 
for  the  current  calendar  year  are  not  available  and  (2)  to  obtain  some 
forecasts  for  previous  years  using  these  methods  and  forecasts  by  the 
exponential  smoothing  methods  using  no  current  year  data.     The  latter  is 
needed  to  get  some  feel  for  the  error  involved  the  1973  forecasts. 

To  obtain  forecasts  for  this  year  based  on  last  years  data  two 
modifications  of  the  NHTSA  method  were  used.     The  first  was 

-      ^(K)         lyK  yK 

0+1  ^  K  /  1     -i+1  ^^1     -i+1  ^ 
0        K^l  "^-1        ^1  "^-i 
The  right  side  of  this  equation  represents  the  ratio  of  average  pairs. 
For  1973,  K  =  12,  while  for  1972,  K  =  11.     The  second  method  was 

0+1      1  -i+1  ^.         ^  T 

  -  —}  ^    =  average  of  K  ratios,  K  =  1, .  .  .  . 

m^        K  ■'—1    m  ^ 

Each  of  these  methods  generates  K  estimates 

;(1)       (2)  (K) 
0+1'     0+1' ' • * '0+1 

and  one  can  take  the  mean  of  these  K  estimates  to  provide  other 

estimates.  . 

Using  difference  techniques  we  obtained 

^0+1  -  "^0  =  ^0  -  "^-1  °^  ^0+1  =  ^0     ("^0  -  "-1^ 
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A  simple  variant  on  this  is 


Y(K)  -  m    =  i    5;  (Y 
0+1        0      ^    i=o  i 


m 


K  =  1, 


Another  is 


K-1 


Y(K)  =  Y 
0+1  ' 


0 


i=0 


m 


i.,^)=Y  +m„-m  , 
-1+1  0        0  -1 


K  =  1, 


Results  of  employing  some  of  these  modifications  to  obtain  forecasts 
for  the  current  year  and  previous  years  are  presented  in  Tables  8-12. 
In  each  of  these  tables  m  refers  to  the  lumped  data  for  the  last  6  months 
of  the  year  and  Y  refers  to  the  yearly  total. 

Forecasts  for  1973,  1972,  and  1971  using  the  method  of  the  ratio 
of  K  average  pairs  are  presented  in  Tables  8  and  9.     For  1973,  the 
low  value  is  obtained  for  K  =  3,  the  high  value  for  K  =  11  and  the  K 
corresponding  to  the  mean  value  is  between  6  and  7-     For  1972,  the 
the  low  value  is  obtained  for  K  =  2,  the  high  for  K  =  10  and  the  K 
corresponding  to  the  mean  value  is  between  5  and  6.     For  1971,  the  values 
are  K  =  1,  K  =  9,  and  between  7  and  8.      For  both  1971  and  1972  the 
observed  value  is  located  in  the  same  interval  as  the  mean  value  of  the 
K  forecasts. 

Similar  results  were  obtained  using  the  method  of  the  average  of  K 
ratios.     See  Tables  10  and  11. 

Results  by  the  simple  difference  method  (modified)  for  the  years 
1962-1973  are  presented  in  Table  12. 

To  reduce  the  amount  of  computation  the  exponential  smoothing 
programs  were  run  only  in  the  lumped  mode.     Forecasts  for  1973  are 
presented  in  Table  13,  while  corresponding  forecasts  and  errors  for 
1972  are  given  in  Tables  ik  and  15 .     In  addition  to  forecasting  for 
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each  region,  E  Regions  and  Entire  U.S.  by  M,  Q,  H,  and  Y  an  average  of 
these  four  methods  was  taken.     This  Is  denoted  by  L,  where 
L  =  i4[L(M)  +  L(Q)  +  L(H)  +  L(Y)]. 

Using  GEXSMO,   the  best  estimate  for  the  national  traffic  fatalities 
for  1972  was  obtained  using  L(H),  and  identical  estimates  were  obtained 
for  E  Regions  and  Entire  U.S.  by  L(H).     However,  all  of  the  forecasts 
for  1972  by  Z  Regions  and  Entire  U.S.  had  percent  error  less  than  1  per- 
cent, with  the  exception  being  Entire  U.S.  by  L(Y).     Excluding  this 
outlier  of  -.0315,  all  the  GEXSMO  estimates  were  in  the  interval 
[.0019  to  .0089] . 

Using  EXPSMOOTHING  the  best  estimate  for  the  national  traffic 
fatalities  for  1972  was  obtained  using  Z  Regions  and  L(Q).     (Tables  14, 
15) 

Results  for  the  NHTSA  (Modified)  method  applied  to  the  Entire  U.S. 
for  1972  range  from  54121  to  55870  with  mean  average  fractional  errors 
of  .0046  and  .0053.     (Tables  14,  15) 

A  grand  average  of  the  25  estimates  for  1972  not  all  Independent, 
slightly  underf orecast  the  national  traffic  fatality  value  by  -265 
(-.0048).     (Tables  14,  15). 
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h . 0  CONCLUSIONS 

Based  on  the  computations  using  12  years  and  6  months  of  data  to 
estimate  the  national  highway  traffic  fatalities  for  the  last  6  months  of 
1972  (and  thus  the  total  calendar  year  traffic  fatalities),  there 
appears  to  be  some  advantage  in  using  lumped  data  for  each  region 
aggregated  either  quarterly  or  half  yearly,  and  using  the  Sma.  of  Regions 
Method  to  estimate  the  national  value. 

There  does  not  appear  to  be  any  great  difference  in  the  results 
obtained  by  using  the  non-decompositional  methods  and  those  obtained  by 
time-series  analysis  programs  based  on  exponential  smoothing  methods. 
Qnly  a  few  of  the  vast  number  of  variations  possible  by  non-decompositional 
methods  were  tried,  and  the  results  of  these  compared  favorably  with  those 
done  using  exponential  smoothing.     Of  the  K  forecasts  possible  with  the 
NHTSA  methods  (or  slight  modifications)  good  results  were  obtained  for 
national  estimates  by  averaging  over  the  K  forecasts. 

There  does  not  appear  to  be  any  great  gain  to  be  had  in  creating  new 
sub-regions  of  existing  regions  on  the  basis  of  homogeneity  of  seasonality 
factors,  although  some  benefit  was  evidenced.    Although  not  investigated 
here,  possibly  a  redefinition  of  the  regions  by  homogeneity  of  data 
properties  might  lead  to  better  forecasts  for  the  nation. 

Several  regions  (notably  Regions  8  and  2)  were  consistently  forecast 

poorly  by  all  methods  employed.     However,  the  forecasts  for  Region  2  were 

(12)  fl?) 

in  error  by  less  than  one  percent  by  the  NHTSA  methods  N^^     '  and  N^  . 
By  contrast.  Region  9  was  forecast  well  by  all  methods  tried. 
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On  the  basis  of  the  sample  taken,  if  one  were  interested  in  accuracy 
of  regional  forecasts,  it  would  appear  that  one  of  the  time-series  programs 
should  be  preferred  over  the  current  N  methods. 

To  forecast  national  highway  traffic  fatalities  for  1973  for 
non-decompositional  methods  used  by  NHTSA  had  to  be  modified  slightly 
since  no  data  for  the  early  months  of  1973  were  at  hand.     Using  these 
modified  methods  and  the  two  exponential  smoothing  methods  programmed 
to  yield  yearly  rather  than  half-year  forecasts,  25  estimates,  not 
necessarily  independent,  were  made  for  calendar  year  1973-  These 
ranged  from  a  low  of  5^186  to  a  high  of  5599^,  with  mean  of  55055. 
The  corresponding  estimates  by  the  same  methods  for  1972  ranged  from  a 
low  of  53166  to  a  high  of  55537  with  a  mean  of  '^h62k.     Observed  national 
traffic  fatalities  for  1972  were  5^889. 


32 


APPENDIX 

COMMENTS  ON  FORECASTING  TECHNIQUES  AND  COMPUTER  PROGRAMS 


Time  series  analysis  generally  goes  through  a  decomposition  process 
to  isolate  the  trend.     In  this  process,  the  seasonal  and  other  regular 
variations  are  identified.     The  reconstitution  of  a  future  time  series, 
i.e.,  extrapolation,  is  performed  by  extrapolating  the  trend  and  then 
imposing  the  corresponding  cyclical  variations.     Since  the  NHTSA  need  is 
for  the  annual  total,  the  requirement  is  less  severe  since  the  annual 
total  can  be  projected  without  direct  consideration  of  seasonal  or  other 
cyclical  variations. 

The  current  NHTSA  procedures  utilizes  the  "non-seasonal"  or 
"non-decompositional"  procedure,  but  does  this  in  a  very  straightforward 
way  with  one  assumption  and  variants  of  the  assumption.     The  ass\imption 
NHTSA  uses  is  one  of  proportionality,  that  is,  that  this  year's  fatalities 
is  proportional  to  last  years'  fatalities  as  the  cumulative  total  up  to 
this  time  is  to  the  corresponding  cumulative  total  last  year.  The 
variants  have  to  do  with  using  the  average  of  the  last  k  years  of 
cumulative  totals  instead  of  just  last  year's.     Further  variations  not 
now  among  the  NHTSA  methods  can  be  tried  on  this,  such  as  various 
weighted  averaging  ( experiment ial  smoothing)  of  the  last  k  years  just 
like  the  weighted  schemes  in  time  series  analysis. 

A  different  basic  ass\amption  would  be  one  of  additivity  instead  of 

proportionality.     The  simplest  procedure  under  this  assumption  is  to 

take  the  fatalities  over  the  latest  one  year  period.     In  your  notation, 

=  m^  +  -  m_-|^),  where  the  second  term  is  the  fatalities  last 

year  over  the  period  complementary  to  mg.     Many  variations  can  be 
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imposed  on  this,  such  as  various  weighting  schemes  on  the  prior  years' 
data.     This  additive  scheme  is  more  conservative,  but  may  be  most  suitable 
"When  the  third  quarter  data  are  in.     For  the  first  quarter  data,  this 
procedxare  may  be  too  conservative  "when  there  is  a  significant  trend. 

Clearly,  it  is  possible  to  take  combinations  of  the  additive  and 
proportional  schemes.    All  of  these  procedures  would  be  very  simple 
methods .    .  . 

A  further  possibility  is  to  combine  either  of  the  "non-decompositional" 
approach  with  that  of  time  series  analysis,  principally  the  trend  analy- 
sis portion.     If  the  trend  projection  is  imposed  on  the  procedures  with 
the  additive  assumption  the  result  will  be  somewhat  kin  to  the  proce- 
dures using  time  series  analysis.     If  imposed  on  the  proportionality 
procedure,  the  concept  will  be  quite  different  as  might  be  the  results. 

Then  there  is  the  wide  gamut  of  procedures  that  utilize  other  data 
in  addition  to  the  fatalities.     Thus,  related  data  such  as  auto  registra- 
tion, number  of  teenaged  drivers,  etc.,  that  might  be  used  as  additional 
predictors  would  be  utilized.     This  procedure  could  soon  extend  outside  of 
the  "simple"  procedures,  but  would  certainly  be  the  appropriate  approach 
to  investigate  if  the  extrapolation  extends  much  beyond  the  current 
annual  total. 

These  (edited)  comments  have  been  graciously  furnished  by  Dr.  George 
Suziiki  of  the  Technical  Analysis  Division,  National  Bureau  of  Standards. 
As  indicated  in  the  main  report  and  the  list  of  symbols  and  formulae, 
several  of  these  methods  have  been  tried,  but  an  exhaustive  study  was  not 
made. 
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